Skip to content

[Celestica][Ladakh800bcls] fix check error 'laneStat.get_ber() <= laneStat.get_maxBer()' in Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity#978

Open
gang-tao wants to merge 1 commit intofacebook:mainfrom
gang-tao:ladakh800bcls_agent_fix8
Open

[Celestica][Ladakh800bcls] fix check error 'laneStat.get_ber() <= laneStat.get_maxBer()' in Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity#978
gang-tao wants to merge 1 commit intofacebook:mainfrom
gang-tao:ladakh800bcls_agent_fix8

Conversation

@gang-tao
Copy link
Contributor

@gang-tao gang-tao commented Mar 4, 2026

Pre-submission checklist

  • [ ✓] I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • [ ✓] pre-commit run
    clang-format.............................................................Passed
    black................................................(no files to check)Skipped
    shellcheck...........................................(no files to check)Skipped
    shfmt................................................(no files to check)Skipped
    trim trailing whitespace.................................................Passed
    fix end of files.........................................................Passed
    check yaml...........................................(no files to check)Skipped
    check json...........................................(no files to check)Skipped
    check for merge conflicts................................................Passed
    ruff check...........................................(no files to check)Skipped

Summary

During the link test, Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity failed with the following error:

/var/FBOSS/fboss/fboss/agent/test/link_tests/AgentEnsemblePrbsTest.cpp:478: Failure
Value of: laneStat.get_ber() <= laneStat.get_maxBer()
Actual: false
Expected: true

[ FAILED ] Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity (826371 ms)
[----------] 1 test from Prbs_ASIC_P31_TO_ASIC_P31 (826371 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (826371 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity

Motivation

The direct cause of the test failure is that the average BER exceeded the maximum BER. These BER values are calculated periodically, transmitted in getPrbsStats, and handled within handleLockWithErrors (refer to fboss/agent/hw/common/PrbsStatsEntry.h).

In handleLockWithErrors, num_errors is defined as a uint32_t. The BER calculation is performed as follows:

    double ber = (num_errors * 1000) / (rate_ * duration.count());

When num_errors is large, the expression num_errors * 1000 overflows and is truncated before being converted to a double. This results in an incorrectly small ber value, which prevents maxBer_ from being updated. Consequently, when the final average BER is calculated, it ends up being higher than the recorded maximum BER.

Solution:

The most straightforward fix is to cast the integer to a floating-point number. This ensures the entire expression is evaluated as a double, preventing the overflow.

    double ber = (num_errors * 1000.0) / (rate_ * duration.count());

Test Plan

command

for filter in Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity ; do
  TS=$(date +%Y%m%d_%H%M%S)
  LD_LIBRARY_PATH=/opt/fboss/lib/ /root/gangtao/img/sai_multi_link_test-sai_impl \
    --gtest_filter=${filter} \
    --config /root/sarah/config/link_test_config-1000  \
    --multi_npu_platform_mapping \
    --logging=DBG5 \
    2>&1 | tee /root/gangtao/log/link_test/sai_multi_link_test-${TS}.log
done

result

==> sai_multi_link_test-20260304_041635.log <==
E0304 04:27:35.898058 285320 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
E0304 04:27:35.898062 285320 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
E0304 04:27:35.898068 285320 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
E0304 04:27:35.899046 285320 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
[       OK ] Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity (660082 ms)
[----------] 1 test from Prbs_ASIC_P31_TO_ASIC_P31 (660082 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (660082 ms total)
[  PASSED  ] 1 test.

==> sai_multi_link_test-20260304_044333.log <==
E0304 04:55:08.998362 288567 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
E0304 04:55:08.998367 288567 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
E0304 04:55:08.998373 288567 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
E0304 04:55:08.999394 288567 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
[       OK ] Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity (695448 ms)
[----------] 1 test from Prbs_ASIC_P31_TO_ASIC_P31 (695448 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (695448 ms total)
[  PASSED  ] 1 test.

==> sai_multi_link_test-20260304_050635.log <==
E0304 05:19:00.920533 296934 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
E0304 05:19:00.920539 296934 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
E0304 05:19:00.920546 296934 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
E0304 05:19:00.921723 296934 FbossEventBase.h:58] runImmediatelyOrRunInFbossEventBaseThreadAndWait for non-running SwSwitchUpdateEventBase FbossEventBase.
[       OK ] Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity (745439 ms)
[----------] 1 test from Prbs_ASIC_P31_TO_ASIC_P31 (745439 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (745439 ms total)
[  PASSED  ] 1 test.

…eStat.get_maxBer()' in Prbs_ASIC_P31_TO_ASIC_P31.prbsSanity
@gang-tao gang-tao requested a review from a team as a code owner March 4, 2026 04:32
@meta-cla meta-cla bot added the CLA Signed label Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant